A hybrid binary dwarf mongoose optimization algorithm with simulated annealing for feature selection on high dimensional multi-class datasets

The dwarf mongoose optimization (DMO) algorithm developed in 2022 was applied to solve continuous mechanical engineering design problems with a considerable balance of the exploration and exploitation phases as a metaheuristic approach. Still, the DMO is restricted in its exploitation phase, somewhat hindering the algorithm's optimal performance. In this paper, we proposed a new hybrid method called the BDMSAO, which combines the binary variants of the DMO (or BDMO) and simulated annealing (SA) algorithm. In the modelling and implementation of the hybrid BDMSAO algorithm, the BDMO is employed and used as the global search method and the simulated annealing (SA) as the local search component to enhance the limited exploitative mechanism of the BDMO. The new hybrid algorithm was evaluated using eighteen (18) UCI machine learning datasets of low and medium dimensions. The BDMSAO was also tested using three high-dimensional medical datasets to assess its robustness. The results showed the efficacy of the BDMSAO in solving challenging feature selection problems on varying datasets dimensions and its outperformance over ten other methods in the study. Specifically, the BDMSAO achieved an overall result of 61.11% in producing the highest classification accuracy possible and getting 100% accuracy on 9 of 18 datasets. It also yielded the maximum accuracy obtainable on the three high-dimensional datasets utilized while achieving competitive performance regarding the number of features selected.

Technological advancement in various fields of endeavor has resulted in a large amount of data being generated in the information industry. The massive data available today can only be meaningful if there are corresponding tools that can transform these data into information without stress. Data mining and machine learning are potent tools in this regard because there has been a tremendous growth in the use of these tools to transform massive data into meaningful information. However, this large amount of data comes with redundancies, noise, and many features which may hinder knowledge discovery activities like a classifier's performance.
Knowledge discovery (KD) activities consist of repeatedly performing data cleaning, dimensional reduction, data integration and transformation, and many other activities. These activities form part of the pre-processing tasks, without which the performance of data mining and machine learning algorithms would be significantly affected. Data is so important nowadays that it is regarded as the 'new currency. ' Careful handling of the 'new currency' is required, making data mining and machine learning a fast-growing field. The dimension of the vast data available is of great concern for data miners because it impacts their ability to transform the data into meaningful information. Many data mining and machine learning tools require considerable time to carry out their tasks. Therefore, noisy data with redundant features would increase these algorithms' time complexity. For this problem to be resolved, the pre-processing step of feature selection becomes crucial, which can impact the performance of learning algorithms. The feature selection then plays a notable role in many research areas 1,2 .
Feature selection has become a prominent approach employed to remove irrelevant and unnecessary features, reducing the attributes that do not aid the purpose of classification but add more burden on computational cost and requirement for space. This process is often categorized into wrapper and filter approaches. The first employs one or more learning algorithms to extract a relevant subset of features. At the same time, the latter is www.nature.com/scientificreports/ hybrid method" presents the proposed method considering its solution presentation, fitness function, and computational complexity. In "Experimental results and discussion", the experimental results of this study are discussed, and the statistical analysis test results are presented. "Testing on high-dimensional datasets" discusses the use of the proposed method on high-dimensional datasets to show its robustness. Finally, "Conclusion and future work" concludes this work by giving its limitation and future direction.

Related work
Recently, metaheuristic algorithms have gained ground in solving optimization problems, and these methods are regularly undergoing enormous improvements from researchers. Metaheuristic algorithms have become pivotal in finding optimal solutions through algorithms' learning using the process of iteration. In 33 , metaheuristic algorithms were divided into population-based and single solution-based. Also 34 , categorized these algorithms into non-nature inspired and nature-inspired metaheuristics. Many researchers have developed several hybrid forms of metaheuristic algorithms to solve feature selection problems. The hybridized methods have proven their superior performance in solving practical and real-world problems 29 . In the first hybrid method (metaheuristic), the Genetic Algorithm (GA) was combined with a local search algorithm to solve the optimization problem of feature selection 35 . From the inspirational viewpoint, metaheuristic algorithms can be generally grouped into swarm-based, physics-based, evolutionary-based, and human-based.
Swarm-based algorithms. Algorithms in this group are inspired by the social interaction or behaviour of birds, animals, insects, fish, schools, herds and so on. The main underlying idea of these algorithms is that everyone has a particular behaviour but coming together as a group or team and harnessing their joint effort enables them to solve very complex optimization problems. Several algorithms have been developed in this category in the last two decades, and researchers have also developed variants of some popular ones. Others have or/are still hybridizing them to solve various optimization problems. One of the prominent ones is the PSO 17 which has gained so much attention due to its rich mathematical basis for solving problems. Other algorithms in this group are Cuckoo Search (CS) 21 , Grey Wolf Optimizer (GWO) by 23 , Krill Herd Algorithm (KH) 24 , Whale Optimization Algorithm (WOA) 36 , Dwarf Mongoose Optimization (DMO) 37 algorithm, Gazelle Optimization Algorithm (GOA) 38 etc. As one of the most notable algorithms in the swarm-based category, the PSO has also been greatly hybridized to solve the feature selection problem. In 39 , a local search algorithm was employed to assist the PSO in searching for the optimal solution and selecting the minimum reducts in relation to their correlation information. Talbi et al. 40 proposed a wrapper-based hybrid GA-SA method called GPSO using SVM as the classifier, and the work of 41 presented a multi-objective and hybrid mutation operator, which were both applied to the classification of microarray data. The study in 42 presented a novel hybridization of the GA with PSO in optimizing feature sets on the datasets of Digital Mammogram. The studies in 43,44 presented two different wrapper-based feature selection methods that hybridized the GA with Ant Colony Optimiser (ACO). Another study 45  Evolutionary-based algorithms. Algorithms that fall under this category are inspired by nature or through biological process of evolution and begin their process by randomly generating their population solutions. The foremost algorithm in this category is the Genetic Algorithm (GA) 18 which generated its fittest individual using mutation and crossover in every generation. The GA has attracted a lot of attention with the creation of different variants, and improvements have been employed to solve many real-world problems. Other popular algorithms developed in this group include genetic programming 47 , tabu search 48 , evolution strategy, differential evolution, flower pollination algorithm 49 , memetic algorithm 50 , Biogeography-Based Optimization 51 , and more.
Apart from the presentation of these evolutional metaheuristic algorithms, the GA being the prominent algorithm in the evolutionary category has attracted significant attention where it was hybridized with other methods to solve different optimization problems [52][53][54][55] where those studies revealed the potency (in terms) of producing better output in comparison with either other local or global search models. It has also been widely hybridized in the domain of feature selection. The GA was also combined with the SA as a filter approach 56 to enhance the GA's local search capability to solve the feature selection problem. This method was evaluated using eight datasets from the UCI machine learning repository. It performed better in selecting the minimum number of a subset of features than other popular methods. Also in 57 , the authors proposed a memetic feature selection algorithm where the study utilized the fuzzy logic in controlling the major parameters on two local search techniques, which was later combined with the GA. In application to the wrapper-based method, the crossover operator of the GA was combined into the metropolis acceptance criterium of the SA 58 . Furthermore, it was hybridized in 59 in classifying power disturbance in the problem of Power Quality (PQ) which also optimize SVM parameters. Moreso in 60 , it was combined with Tabu Search, which employed the Fuzzy ARTMAP Neural Network to evaluate the wrapper feature selection method.
Physics-based algorithms. These algorithms draw their inspiration from the laws of physics in the world.
Physics-based methods are inspired by physics principles, chemistry, music, complex dynamic systems, physics and metallurgy to mathematics 1  www.nature.com/scientificreports/ (GREO) algorithm, which was applied in speech emotion recognition. Also, the study conducted in 67 presented a hybrid feature selection method that is based on the ReliefF filter technique and EO known as RBEO-LS, which have two phases: the first employed the ReliefR algorithm at the pre-processing stage for feature weights assignment, and the second utilized binary EO (BEO) as a wrapper search technique.
Human-based algorithms. Algorithms in this category are inspired by activities performed by humans or human behaviours. Human beings are involved in various activities that affect their performance, and researchers use these behaviours to develop algorithms. The most prevalent algorithms here are Teaching Learning-Based Optimization (TLBO) by 68 and League Championship algorithms (LCA) 69 . Others include Exchange Market Algorithm (EMA) by (Ghorbani & Babaei,70 ), Social-Based Algorithm (SBA) by 71 , Seeker Optimization Algorithm (SOA) by 72 , etc. It is observed from the literature that not many algorithms are human inspired. TLBO, a well-known method in this category, was hybridized in 73 with extreme learning machines (ELM), referred to as TLBO-ELM in solving data classification problems which feature selection falls under. It was tested on some UCI benchmark datasets.
With the strength of metaheuristic algorithms comes to some issues, among which is premature convergence that results in locating limited optimal solutions. Frequently, researchers combine these algorithms with other methods like local search techniques. Generally, the local search algorithm tries to conduct an intensive search of each region of the solution, which can outperform existing metaheuristic solutions. Among the existing local search methods are Simulated Annealing (SA) 30  Preliminaries. Dwarf Mongoose Optimization Algorithm. DMO 37 is a population-based stochastic metaheuristic algorithm inspired by the foraging and social behaviour of dwarf mongoose, also called Helogale. Each dwarf mongoose search for food individually since food search is not a collective exercise, but foraging is done collectively. Due to the seminomadic attribute of these animals, the building of a sleeping mound is close to an abundant source of food. The algorithm mathematically models the lifestyle of this animal to solve optimization problems.
All population-based optimization algorithms commence with random initialization. After that, because of the intensification and diversification rules, every solution gathers around the global best optima. Similarly, the DMO starts its solution by initializing the mongoose's candidate population. This population is generated stochastically between a particular problem's lower and upper bounds.
where X represents the set of the candidates' present population that are generated randomly using Eq. (2), x i,j indicates the position of the jth dimension of the ith population, n indicates the population size, and d is the dimension of the problem.
where unifrnd is a random number that is distributed uniformly, VarMinandVarMax are lower bound and upper bound, respectively VarSize is the dimension of the problem. The best solution at each iteration is the best solution obtained so far.
Like every metaheuristic algorithm, there are two phases in the DMO: exploitation (individual mongoose carries out a thorough search in each search space), also called intensification and exploration (a random search for a new abundant food source or new sleeping mound) or diversification. Three major social structures of the DMO carry out the activities of the two phases mentioned: the alpha group, scout group, and babysitters.
The alpha female (α) is the family unit controller and is selected using Eq. (3).
n − bs matches the number of mongooses in the alpha group. The number of babysitters is denoted by bs, and peep represents the sound of female alpha to the path of the other unit members.
The sleeping mound is determined by abundant food which is expressed in Eq. (4).
where phi is a random uniformly distributed number [− 1,1], after each iteration, there is an evaluation of the sleeping mound; Eq. (5) represents this.
when a sleeping mound is found, an average value is derived using Eq. (6) x n,1 x n,2 · · · x n,d−1 x n,d www.nature.com/scientificreports/ Once the babysitter exchange criterium is attained, the next phase is the scouting, which evaluates the next sleeping mound determined by another food source.
Since mongoose is known not to return to a prior sleeping mound, the scout group goes searching for the next sleeping mound to ensure exploration. The mongoose is known to forage and scout simultaneously in DMO with the justification that the farther the unit forage, the likelihood of finding the next sleeping mound. This is simulated using Eq. (7).
where rand is a random number between [0, 1], Max iter shows the parameter that directs the collective-volatile movement of the mongoose's group that linearly decreases during iterations.
connotes the vector that motivates the movement of mongoose to a new sleeping mound. The babysitter's group remains with the juveniles when the scouting and foraging group searches for a sleeping mound and food source. The number of the members of this group is removed from the total number of candidate population as they do not go to forage or scout until the babysitter exchange parameter is met in Eq. (7). In the following section, the proposed binary variant of DMO is presented with the hybrid method of simulated annealing (SA).

The proposed hybrid method
In this section, the representation of the solution, the fitness function utilized and the proposed method's computation complexity are elaborated. The feature selection problem, an optimization problem represented in binary form, and its solution confined to 0 s and 1 s were taken care of by the BDMO. The agents update their position in each iteration using the BDMO optimization rules and afterwards pass the solution to the SA to locate the better neighborhood solution to improve and refine the results. As a multi-objective optimization problem where two opposing objectives of high classification accuracy and minimal features selected as possible need to be met. The achievement of these two objectives determines how best a solution is. In the proposed method, the BDMO utilizes the tournament selection mechanism to advance the algorithm's diversification capability, which affords a high chance of selecting weak solutions while searching for promising ones.
The DMO is a recent metaheuristic algorithm which proves its efficacy in solving mechanical optimization problems. The algorithm employs a Tau operator, which signifies that if a new food source is not found mindless, the fitness value of the present solution and the one being operated upon the intensification should be performed (Eq. 7). This operator was replaced with the SA as a local search technique that takes an initial state of a solution, processes it, and replaces the improved solution in place of the original one. This technique represents the hybridization of the BDMO and SA as a local search method. Solution representation. The feature selection problem being an optimization problem has its output ∈ [0, 1] .
The zero indicates that the feature is redundant or irrelevant and is thereby rejected, while one signifies that the feature is useful and therefore selected. The possibility that the results might be out of range cannot be ruled out. Therefore, the binarization function is applied to every agent to ensure they remain within the specified range. This is performed using Eq. (8). To select a feature, the position index must be 0.5 and above, which rounds the value to 1, and for any feature to be rejected, its position index must be less than 0.5, which is rounded down to 0.
where BestSol d i is the best solution i in dimension d . Thereby, a mongoose's position shows that a feature set is selected as the value of position increases for the dimensions 79 .
Fitness function. Selecting a useful feature that assists the classifier in recognizing a class of a sample in a dataset is challenging. During the selection process of relevant features, there is a need to remove the redundant ones for the sake of classification automatically and to maximize the accuracy of the classification problem when the feature selected is to be used 80 . In this work, the BDMSAO is utilized to locate the best feature subset and employ the KNN classifier to calculate the classification accuracy. The classification accuracy of this model A c is gotten by a classifier, b s represent the feature subset dimension, and the total number of attributes contained in the dataset is signified by D t . Therefore, the classification error is 1 − A c and the subset of selected features from the dataset is denoted by d s D t . Hence, the fitness function is defined as: where µ ∈ [0,1] is the weight assigned to the error classification.
The BDMSAO algorithm. The binary version of DMO is proposed in this study to solve the problem of feature selection for many benchmark datasets. The aim is to investigate the performance of the new hybrid algo- www.nature.com/scientificreports/ rithm in solving the challenging problem of selecting minimal features from high-density datasets. The resulting BDMO algorithm is supported by the SA method to boost its operations in the local search tasks. The evaluation of the fitness function in the proposed BDMO method utilizes Eqs. (10)(11)(12).
where the xf i represents the number of items in a normalized value of x i , and xs i is applied to compute the actual fitness value of x i which is assigned to f i . www.nature.com/scientificreports/ Algorithm 1 is a detailed listing of the pseudocode for the proposed hybrid algorithm BDMSAO. The search mechanism of the alpha groups in the DMO is now replaced by the SA for an improved local search operation. The algorithm accepts datasets prepared as trainX, testX, trainy, testy, and in addition the population size, the number of iterations, and the dimension of population. Using these inputs, the population is initialized and computation for the fitness value for each individual in the population is derived. During the iteration, three search processes are evaluated including the SA-based search, scout group-based search and the baby seaterbased search. These three-level searches apply nature-specific operations which demonstrates the balance for the exploration and exploitation phases of the proposed BDMSAO algorithm. Specifically, the SA is adapted to improve the local search of BDMO for improved performance. Once the three stages of the search process are completed, the global best solution is identified so that classification accuracy for the solution is computed using the datasets. The algorithm returns the computed number of features selected, the accuracy of the classification leading to the selected number of features, and the best solution.
Complexity of computation. The computation complexity of every metaheuristic algorithm depends on the time each candidate takes to update its positions, the maximum iteration value, other operations such as sorting or comparison, and variable update time. The computational complexity of the BDMSAO is O Max iter * Pop size * Dim s * T fitness , where Max iter is the maximum iteration number, Pop size is the population size, Dim s denotes the search space dimension, T fitness represent the classifier's time required to calculate the fitness of a given solution. The SA is employed to locate the best solutions if they can be found in the neighborhood of the present solution. In terms of O − notation , the SA does not significantly affect the computation cost.
The optimization steps of the developed hybrid BDMSAO algorithm for solving feature selection problems is presented in Fig. 1. In the given figure, the hybrid BDMSAO's first step is defining all the parameters (which includes both BDMO and SA algorithms parameters, respectively). Then the next step is to generate the population representing a set of solutions for the feature selection problem. Subsequently, the fitness function of the individual candidate solution is determined based on evaluating and selecting the best features, after which the current best solution is identified and retained. The next step for the BDMSAO algorithm is to update the current population by using either the BDMO or SA algorithms, again depending on the quality of the fitness function.
The process is such that if the probability of fitness function for the current solution is greater than 0.5, then www.nature.com/scientificreports/ the BDMO is selected for update. Otherwise, the SA algorithm is used to update the current population. Note that the probability above is computed as a factor of the position index (P_index) being >= 0.5 . Thereafter, the fitness function for each solution is computed using Eq. (9), and the best solution is determined after updating the population. The next step is for the BDMSAO to check if the stopping criteria have been met and if yes, then the algorithm returns the overall candidate's best solution. Otherwise, the algorithm will iteratively repeat the previous steps from checking whether P_index is >= 0.5 until finally the stop condition is reached.

Experimental results and discussion
Dataset (low, medium and high-dimensional). To evaluate the performance of the BDMSAO, eighteen University of California Irvine (UCI) low, medium, & high-dimensional datasets and two high-dimensional datasets from the Arizona State University feature selection repository. The details of the datasets used, including their feature number (N), instances, classes, and categories, are presented in Table 1. The high-dimensional datasets contain numerous features of at least two thousand (2000), and few of the datasets are multi-class in nature, ranging from 3 to 9 classes. These high-dimensional datasets usually represent real-world scenarios and are therefore more challenging. This allows us to ascertain the robustness of the proposed feature selection method. Although, there are studies in the literature where some feature selection methods were utilized to solve high-dimensional datasets problems, one of which is the work presented in 81 . However, the maximum number of features in those datasets was only limited to N = 4703 compared to the more high-dimensional feature sizes utilized in the current study. Not many metaheuristic algorithms perform reasonably on high-dimensional and multi-class datasets.
Experimental setup. The proposed BDMSAO was implemented using python. Most often, parameters play a key role in determining the outcome of multi-agent algorithms, particularly the agents' number and iteration's total number, which heavily influence the algorithm's performance. Therefore, the experiment was performed considering different population sizes to determine the suitable size of the population and number of iterations. To test the efficiency of this hybrid approach, we compared the proposed BDMSOA with the BDMO. The classification accuracy and number of features selected are shown in Tables 3 and 5 using various population sizes from 10 to 50. The convergence graphs for both methods are also depicted in Fig. 2 to show the solution's optimal position over the total iteration number of 50. For a fair comparison, each dataset was run 10 times, and the average values of the runs were taken. The computer configuration for this implementation is Core i7, 3.60 GHz CPU with 16 GB RAM. The finding of this experiment reveals that the population size of 10 produced better results which will serve as the basis for comparison in this study. Table 2 presents the parameter setting for the developed hybrid FS methods.
Result and discussion. This subsection discusses the results generated by BDMSAO and BDMO, which were evaluated using eighteen datasets from the UCI repository, with details in Table 1. Since the proposed method is a wrapper-based approach, the utilized classifier is K-Near Neighbor (KNN) since it is a well-known and most widely used classifier in wrapper-based feature selection 82 and it was used with K = 5 in the experiment. The generated results show the outperformance of the BDMSAO over the binary DMO. The outcome of  Tables 3-4 indicates the efficacy of the proposed hybrid method over the BDMO in locating better solutions. We can conclude that the BDMSAO performed better on UCI datasets than the BDMO. A critical inspection of the results in Table 3 indicates that BDMSAO generates better results than BDMO on all datasets. The classification accuracy produced is greater than 90% on 16 of 18 datasets (88.88%) except on Exactly2 & Tic-tac-toe and yielded 100% on 9 of 18 datasets (50%). In the number of features selected, the BDMSAO selected fewer features on 6 datasets (BreastEW, CongressEW, HeartEW, SpectEW, Wine & Zoo), the same number of features on 2 datasets (Exactly & M-of-n), and BDMO selected fewer datasets on 11 datasets.
The convergence behavior of both methods is shown in Fig. 2. It is observed that each algorithm converges steadily in all datasets. However, the BDMSAO achieved a better convergence to show its superiority over the BDMO.  www.nature.com/scientificreports/ Furthermore, the fitness function's optimization pattern on the defined problem space was investigated, and the results obtained were graphed for comparative presentation. The graphing is grouped based on the dataset used for experimentation so that each graph presents a comparative outline of curves for some selected algorithms considered in this study. Values used for the graphing were the fitness value over all iterations in each case of the datasets on all optimization algorithms. Again, these plots illustrate the convergence pattern for each algorithm as experimented on different datasets. The BDMO, BDMSAO, GNDO-SA, ASGW, HSGW, AIEOU, WOA, RSGW, RTHS, and PSO algorithms were considered in the convergence plots. Figure 2 shows the convergence plots for ionosphere, congressEW, Exactly, Exactly2, and Vote datasets. For the ionosphere dataset, curves of GNDO-SA, ASGW, HSGW, AIEOU, WOA, RSGW, RTHS, and PSO run below those of BDMO and BDMSAO. A similar pattern is repeated for congressEW, Exactly, and Exactly2 datasets. Interestingly, for the Vote dataset, the PSO algorithms perform better than all algorithms with the BDMO algorithm. A competitive performance seen with both BDMO and BDMSAO overlap in the congressEW and Exactly2 datasets but shows the slight distance in the case of the ionosphere, Exactly, and Vote datasets. The highest fitness value of 1.0 obtained for all 50 iterations is reported by PSO in the Vote dataset. The relative highest values obtained for 0.9 and above were those seen in BDMO and BDMSAO on the ionosphere, congressEW, and Vote datasets. The performance of BDMO and BDMSAO algorithms demonstrates a superior performance when compared with all the similar methods in all the five datasets compared in the figure. This shows that the algorithms are suitable for finding the minimum number of features required for classifying class distributions in the datasets. www.nature.com/scientificreports/ In Fig. 3, the convergence curve for the ASGW algorithm for all datasets of the colon, HeartEW, BreastEW, BreastCancer, and Lymphography are seen to spike with a measure of instability from the first iteration to the last. Those for GNDO-SA, HSGW, RTHS, RSGW, WOA, and AEIOU are poorly fitted, considering the convergence curves for these algorithms lying far below those of PSO, BDMO, and BDMO-SA. Note that in all datasets listed in the figure, PSO, BDMO, and BDMO-SA are seen to competitively converge above all other methods confirming the superiority of the three methods when compared with others. However, in all the five datasets reported in the figure, BDMO and BDMO-SA performed better than PSO, which only converges based on its function evaluation values above the other two when HeartEW and BreastCancer datasets were experimented with. In all cases where BDMO and BDMO-SA performed well above others, we see the fitness values obtained lined through values above 0.8. This significant classification value confirms that the number of features selected for the two algorithms in those datasets represents a high-quality selection.
The BDMO and BDMO-SA are reported to have performed well in three of the five datasets used for the plots in Fig. 4. Although the PSO algorithm showed a competitive performance with the two algorithms, we note that these are only limited to the SpaceEW and M-of-n datasets. Meanwhile, other algorithms have their curves running below those for PSO, BDMO and BDMO-SA in all the five datasets. We discovered that the nature of the five datasets observed in the figure is computationally demanding, given the unstable performance of the ASGW and RSGW. Figure 5 shows the convergence curves for all algorithms on PenglungEW, Tic-tac-toe, Wine, and KrVsK-pEW datasets. Interestingly, the impact of hybridizing BDMO with the SA algorithm proved outstanding as the algorithm competes with PSO well in three cases. The computational difficulty experienced with KrVsKpEW for all algorithms still puts the BDMO-SA algorithm ahead of others to demonstrate that the proposed hybrid www.nature.com/scientificreports/ algorithm is suitable for selecting the optimal number of features required for solving the classification problem. These outstanding performances are not limited to the KrVsKpEW dataset alone but span across all datasets considered in this study.
Considering the outstanding and competitive performance of the BDMSAO and BDMO algorithms, as reported in previous paragraphs, the plots for accuracy are concentrated only on the two algorithms for convenient comparative analysis. In Figs. 6, 7, 8 and 9. A plot for accuracy against iteration for all datasets is presented for BDMSAO and BDMO. In Fig. 6, the accuracy obtained for BDMSAO for all the five datasets indicates that it performs better than the base algorithm, which is the BDMO. Where we see the curve for BDMSAO rising, that of BDMO was dropping to rise in experimentation with some datasets. Meanwhile, all accuracies for the datasets considered rose above 0.9, with those of Exactly and Vote running on 1.0 accuracies for all iterations. Figure 7 shows the curve for accuracies obtained for HeartEW, BreastEW, BreastCancer, and Lymphography datasets over all iterations. Similar to the previous discussion, we see the hybrid method of BDMSAO doing well in all datasets when compared with BDMO.
The plot for the accuracy values obtained for the datasets on waveformEW, sonar, SpaceEW, M-of-n, and Zoo are shown in Fig. 8. There is a significant difference in the performance of BDMSAO when compared with its corresponding base algorithm. In most cases, the values for accuracy obtained were rising to 1.0, whereas those of BDMO were lying below 0.8. Again, this result demonstrates that the accuracy of the features selection result discussed in previous sections is consistent and represents an outstanding performance of the new hybrid method. Figure 9 shows a comparison of the curve for PenglungEW, Tic-tac-toe, Wine, and KrVsKpEW datasets. The BDMO performance in these datasets rose above what has been observed in other datasets, though it still lags behind that of the hybrid method, the BDMSAO proposed and implemented in this study.
Comparison. The discussion earlier clarifies that the BDMSAO outperforms the BDMO. In this subsection, the performance of the proposed hybrid method is compared with nine other state-of-the-art methods, out of which seven are hybrid methods. The algorithms that were compared with other methods like adaptive switching grey-whale optimizer (ASGW), social ski driver algorithm and late acceptance hill-climbing (SSDs + LAHC) 83 , serial grey-whale optimizer (HSGW), embedded chaotic whale survival algorithm (ECWSA-4) 84 , binary GA, random switching grey-whale optimizer (RSGW), electrical harmony-based metaheuristic (EHHM), BPSO, and binary simulated normal distribution optimizer (BSNDO) 1 .
Given the results obtained in Table 5, we can conclude that the BDMSAO and BSNDO yield better results than other methods on 11 out of 18 datasets, which is 61.11%. Meanwhile, the BDMSAO produced a 100% accuracy on 9 out of 18 datasets (50%), while its competitor, the BSNDO, produced 100% accuracy on 8 out of 18 datasets (44.4%). On the Breastcancer dataset, the BDMSAO, BSNDO, and EHHM achieved 100% accuracy. The proposed BDMSAO produced the second-best result and BSNDO and SSDs + LAHC on the BreastEW dataset after ASGW and EHHM. On the CongressEW dataset, BDMSAO and EHHM came third after BSNDO   Statistical test. In Table 7, the Friedman mean ranking test results are shown in bold values as the bestranked algorithm. In most cases, the BPSO ranked highest, as seen in the table, and our proposed method ranked second in all cases where the BPSO ranked first. However, the BDMSAO ranked better than the BPSO on Exactly and Exactly2 datasets and then tied with the BPSO on two datasets i.e. PenguinEW and Sonar. It also outranks other methods (that are hybrid methods as our proposed method) in the ranking. This statistically shows the performance significance of the BDMSAO over other algorithms. The statistical significance of the BDMSAO and other algorithms are the same on average on all measures on most of the datasets used, producing significant values less than 0.05. The 0.05 is the representation of the significant level of 5%, which is used in the acceptance of the null value. As our proposed method generated more values that are less than 0.05 than all other algorithms in all datasets except for the BPSO. Therefore, this validates the fact that the samples were produced from a continuous distribution having the same medians as opposed to the null hypothesis that does not. This gives convincing proof that the results obtained by the BDMSAO are statistically significant compared to other similar methods. Table 8 presents the Wilcoxon mean rank test.

Testing on high-dimensional datasets
The results discussed above revealed the performance of the BDMSAO over other well-known algorithms used in this study. To evaluate the robustness of this proposed algorithm, we tested its application to three highdimensional datasets known to be extremely challenging. The dataset's description is provided in Table 9. The efficacy of the BDMSAO is also proven in comparison with eight (8) well-known state-of-the-art feature selection methods mentioned in Table 10. The number of features selected by BDMSAO compared with some popular FS selection methods is shown in Figs. 10, 11, and 12. All algorithms, including the BDMSAO, yielded the highest classification accuracy on both high-dimensional datasets. The BDMSAO selected the least number of features on the colon and leukemia datasets, respectively, compared to those achieved by the AIEOU, SFO and BSNDO, thus confirming the proposed method's ability to select the least features as indicated previously. However, the BDMSAO selected the third best number of features after RTHS and AIEOU on the Prostate_GE dataset.

Conclusion and future work
This paper proposed a new hybridized feature selection problem-solving method called the BDMSAO. The hybridization concept emanates from the methodological enhancement of the standard BDMO and SA algorithms. The developed BDMSAO utilized the SA as a local search method to enhance the exploitation of the BDMO and aid a suitable balance between the exploitation and exploration of the hybrid method. Interestingly, the BDMSAO accomplished a substantial enhancement in solving feature selection problems regarding www.nature.com/scientificreports/ classification accuracy achieved against the BDMO and other well-known state-of-the-art algorithms used for comparison in this study. The performances of the proposed approaches were assessed and compared against nine other feature selection methods, including the ASGW, BSNDO, HSGW, BPSO, BGA, RSGW, SSDs + LAHC, ECWSA-4, and EHHM, respectively. The evaluation criteria reported for each approach include the classification accuracy, average feature selection size number, and the respective algorithms' convergence characteristics. Similarly, the BDMSAO was compared against the BDMO algorithm to ascertain the validity of the initial enhancement claim over the BDMO.
This developed feature selection approach was mainly evaluated and validated on some UCI datasets confirmed to be challenging. The new method was also tested using three high-dimensional datasets to prove its robustness in finding reasonable solutions to real-world problems that are often considered complex and difficult to solve using conventional methods. The results obtained by the BDMSAO indicate that the proposed method is applicable in various publicly available datasets. A limitation of this study may be in the computation complexity due to the addition of a local search technique.
In the future, it would be interesting to consider hybridizing the BDMO with other state-of-the-art metaheuristics such as the GA, PSO, CSO, GWO, PDO and KHA algorithms. Also, it will be worth considering employing the hybrid BDMSAO algorithm in other real-world problem areas like image processing, facial       Table 9. High-dimensional datasets and their properties.